50 research outputs found

    Exploiting Lexical Resources for Therapeutic Purposes: the Case of WordNet and STaRS.sys

    Get PDF
    In this paper, we present an on-going project aiming at extending the Word-Net lexical database by encoding common sense featural knowledge elicited from language speakers. Such extension of WordNet is required in the framework of the STaRS.sys project, which has the goal of building tools for supporting the speech therapist during the preparation of exercises to be submitted to aphasic patients for rehabilitation purposes. We review some preliminary results and illustrate what extensions of the existing WordNet model are needed to accommodate for the encoding of commonsense (featural) knowledge

    Modelling the Meaning of Argument Constructions with Distributional Semantics

    Get PDF
    Current computational models of argument constructions typically represent their semantic content with hand-made formal structures. Here we present a distributional model implementing the idea that the meaning of a construction is in- timately related to the semantics of its typical verbs. First, we identify the typical verbs occurring with a given syntactic construction and build their distributional vectors. We then calculate the weighted centroid of these vectors in order to derive the distributional signature of a construction. In or- der to assess the goodness of our approach, we replicated the priming effect described by Johnson and Golberg (2013) as a function of the semantic distance between a construction and its prototypical verbs. Additional support for our view comes from a regression analysis showing that our distributional in- formation can be used to model behavioral data collected with a crowdsourced elicitation experiment

    A Feature Type Classification for Therapeutic Purposes: A Preliminary Evaluation with Non-Expert Speakers

    Get PDF
    We propose a feature type classification thought to be used in a therapeutic context. Such a scenario lays behind our need for a easily usable and cognitively plausible classification. Nevertheless, our proposal has both a practical and a theoretical out-come, and its applications range from com-putational linguistics to psycholinguistics. An evaluation through inter-coder agree-ment has been performed to highlight the strength of our proposal and to conceive some improvements for the future

    "Beware the Jabberwock, dear reader!" Testing the distributional reality of construction semantics

    Get PDF
    Notwithstanding the success of the notion of construction, the computational tradition still lacks a way to represent the semantic content of these linguistic entities. Here we present a simple corpus-based model implementing the idea that the meaning of a syntactic construction is in- timately related to the semantics of its typical verbs. It is a two-step process, that starts by identifying the typical verbs occurring with a given syntactic construction and building their dis- tributional vectors. We then calculated the weighted centroid of these vectors in order to derive the distributional signature of a construction. In order to assess the goodness of our approach, we replicated the priming effect described by Johnson and Golberg (2013) as a function of the semantic distance between a construction and its prototypical verbs. Additional support for our view comes from a regression analysis showing that our distributional information can be used to model behavioral data collected with a crowdsourced elicitation experiment

    CAPISCO@CONcreTEXT 2020: (Un)supervised Systems to Contextualize Concreteness with Norming Data

    Get PDF
    This paper describes several approaches to the automatic rating of the concreteness of concepts in context, to approach the EVALITA 2020 “CONcreTEXT” task. Our systems focus on the interplay between words and their surrounding context by (i) exploiting annotated resources, (ii) using BERT masking to find potential substitutes of the target in specific contexts and measuring their average similarity with concrete and abstract centroids, and (iii) automatically generating labelled datasets to fine tune transformer models for regression. All the approaches have been tested both on English and Italian data. Both the best systems for each language ranked second in the task

    Encoding Commonsense Lexical Knowledge into WordNet

    Get PDF
    In this paper, we propose an extension of the WordNet conceptual model, with the final purpose of encoding the common sense lexical knowledge associated to words used in everyday life. The extended model has been defined starting from the short descriptions generated by naĂŻve speakers in relation to tar-get concepts (i.e. feature norms). Even if this proposal has been developed primarily for therapeutic purposes, it can be seen as a generalization of the original WordNet model that takes into account a much wider and systematic set of semantic relations. The extended model is also an enhancement of the psycholinguistic vocation of the WordNet model. A featural representation of concepts is nowadays assumed by most models of the human semantic memory. For testing our proposal, we conducted a fea-ture elicitation experiment and collected de-scriptions of 50 concepts from 60 participants. Problematic issues related to the encoding of this information into WordNet are discussed and preliminary results are presented

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

    Dotare il sardo di dati normativi su età d’acquisizione, familiarità e accordo sul concetto: Uno studio preliminare con 50 figure di Snodgrass & Vanderwart (1980)

    Get PDF
    In the present work, some normative data specifically relating to the Sardinian language were obtained on a set of 50 pictures taken from the famous study by Snodgrass & Vanderwart (1980). The parameters on which these normative data were obtained are some of the most studied in the literature: Age of Acquisition (AoA), Familiarity (FAM), and Concept Agreement (CA). 106 Sardinian native speakers took part in the experiment, carried out completely in written form via an online platform. In addition to providing, for each of the 50 images, normative data on the aforementioned parameters, this work found that AoA and FAM are strongly negatively correlated indicators; a correlation was also observed between both these parameters and the Concept Agreement measure, although these correlations are decidedly more moderate. A comparison was also made between the results of this work and those of two studies that collected normative data for Italian on the same parameters: Nisi et al. (2000) and Dell’Acqua et al. (2000). It was observed that Sardinian participants judged the depicted objects as significantly more familiar, and they claimed that they had learned the words denoting those objects significantly earlier. As for the CA, on the other hand, the data on Italian show a significantly higher percentage on average. However, while for AoA and FAM a strong positive correlation was found between the data on Italian and those on Sardinian, the data on these two languages are clearly uncorrelated for CA, suggesting that the degree of ease in finding a valid name for a picture is dictated by different factors in a national language such as Italian compared to a local language such as Sardinian. More generally, this shows that, before carrying out picture-naming tasks in a given language, it is advisable to have specific normative data for that language, even if it is a minority language or a dialect

    Dotare il sardo di dati normativi su età d’acquisizione, familiarità e accordo sul concetto: Uno studio preliminare con 50 figure di Snodgrass & Vanderwart (1980)

    Get PDF
    In the present work, some normative data specifically relating to the Sardinian language were obtained on a set of 50 pictures taken from the famous study by Snodgrass & Vanderwart (1980). The parameters on which these normative data were obtained are some of the most studied in the literature: Age of Acquisition (AoA), Familiarity (FAM), and Concept Agreement (CA). 106 Sardinian native speakers took part in the experiment, carried out completely in written form via an online platform. In addition to providing, for each of the 50 images, normative data on the aforementioned parameters, this work found that AoA and FAM are strongly negatively correlated indicators; a correlation was also observed between both these parameters and the Concept Agreement measure, although these correlations are decidedly more moderate. A comparison was also made between the results of this work and those of two studies that collected normative data for Italian on the same parameters: Nisi et al. (2000) and Dell’Acqua et al. (2000). It was observed that Sardinian participants judged the depicted objects as significantly more familiar, and they claimed that they had learned the words denoting those objects significantly earlier. As for the CA, on the other hand, the data on Italian show a significantly higher percentage on average. However, while for AoA and FAM a strong positive correlation was found between the data on Italian and those on Sardinian, the data on these two languages are clearly uncorrelated for CA, suggesting that the degree of ease in finding a valid name for a picture is dictated by different factors in a national language such as Italian compared to a local language such as Sardinian. More generally, this shows that, before carrying out picture-naming tasks in a given language, it is advisable to have specific normative data for that language, even if it is a minority language or a dialect

    Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models

    Get PDF
    In this work we carried out an idiom type identification task on a set of 90 Italian V-NP and V-PP constructions comprising both idioms and non-idioms. Lexical variants were generated from these expressions by replacing their components with semantically related words extracted distributionally and from the Italian section of MultiWordNet. Idiomatic phrases turned out to be less similar to their lexical variants with respect to non-idiomatic ones in distributional semantic spaces. Different variant-based distributional measures of idiomaticity were tested. Our indices proved reliable in identifying also those idioms whose lexical variants are poorly or not at all attested in our corpus
    corecore